{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_01_3_python_collections.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# T81-558: Applications of Deep Neural Networks\n",
    "**Module 1: Python Preliminaries**\n",
    "* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)\n",
    "* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Module 1 Material\n",
    "\n",
    "* Part 1.1: Course Overview [[Video]](https://www.youtube.com/watch?v=IzZSwS45vt4&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_01_1_overview.ipynb)\n",
    "* Part 1.2: Introduction to Python [[Video]](https://www.youtube.com/watch?v=czq5d53vKvo&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_01_2_intro_python.ipynb)\n",
    "* **Part 1.3: Python Lists, Dictionaries, Sets and JSON** [[Video]](https://www.youtube.com/watch?v=kcGx2I5akSs&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_01_3_python_collections.ipynb)\n",
    "* Part 1.4: File Handling [[Video]](https://www.youtube.com/watch?v=FSuSLCMgCZc&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_01_4_python_files.ipynb)\n",
    "* Part 1.5: Functions, Lambdas, and Map/Reduce [[Video]](https://www.youtube.com/watch?v=jQH1ZCSj6Ng&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_01_5_python_functional.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Google CoLab Instructions\n",
    "\n",
    "The following code ensures that Google CoLab is running the correct version of TensorFlow."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Note: not using Google CoLab\n"
     ]
    }
   ],
   "source": [
    "try:\n",
    "    from google.colab import drive\n",
    "    %tensorflow_version 2.x\n",
    "    COLAB = True\n",
    "    print(\"Note: using Google CoLab\")\n",
    "    \n",
    "except:\n",
    "    print(\"Note: not using Google CoLab\")\n",
    "    COLAB = False"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 1.3: Python Lists, Dictionaries, Sets, and JSON\n",
    "\n",
    "Like most modern programming languages, Python includes Lists, Sets, Dictionaries, and other data structures as built-in types. The syntax appearance of both of these is similar to JSON. Python and JSON compatibility is discussed later in this module. This course will focus primarily on Lists, Sets, and Dictionaries. It is essential to understand the differences between these three fundamental collection types.\n",
    "\n",
    "* **Dictionary** - A dictionary is a mutable unordered collection that Python indexes with name and value pairs.\n",
    "* **List** - A list is a mutable ordered collection that allows duplicate elements.\n",
    "* **Set** - A set is a mutable unordered collection with no duplicate elements.\n",
    "* **Tuple** - A tuple is an immutable ordered collection that allows duplicate elements.\n",
    "\n",
    "Most Python collections are mutable, meaning the program can add and remove elements after definition. An immutable collection cannot add or remove items after definition. It is also essential to understand that an ordered collection means that items maintain their order as the program adds them to a collection. This order might not be any specific ordering, such as alphabetic or numeric.\n",
    "\n",
    "Lists and tuples are very similar in Python and are often confused. The significant difference is that a list is mutable, but a tuple isn’t. So, we include a list when we want to contain similar items and a tuple when we know what information goes into it ahead of time.\n",
    "\n",
    "Many programming languages contain a data collection called an array. The array type is noticeably absent in Python. Generally, the programmer will use a list in place of an array in Python. Arrays in most programming languages were fixed-length, requiring the program to know the maximum number of elements needed ahead of time. This restriction leads to the infamous array-overrun bugs and security issues. The Python list is much more flexible in that the program can dynamically change the size of a list.\n",
    "\n",
    "The next sections will look at each collection type in more detail.\n",
    "\n",
    "### Lists and Tuples\n",
    "\n",
    "For a Python program, lists and tuples are very similar. Both lists and tuples hold an ordered collection of items. It is possible to get by as a programmer using only lists and ignoring tuples.\n",
    "\n",
    "The primary difference that you will see syntactically is that a list is enclosed by square braces [], and a tuple is enclosed by parenthesis (). The following code defines both list and tuple."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['a', 'b', 'c', 'd']\n",
      "('a', 'b', 'c', 'd')\n"
     ]
    }
   ],
   "source": [
    "l = ['a', 'b', 'c', 'd']\n",
    "t = ('a', 'b', 'c', 'd')\n",
    "\n",
    "print(l)\n",
    "print(t)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The primary difference you will see programmatically is that a list is mutable, which means the program can change it. A tuple is immutable, which means the program cannot change it. The following code demonstrates that the program can change a list. This code also illustrates that Python indexes lists starting at element 0. Accessing element one modifies the second element in the collection. One advantage of tuples over lists is that tuples are generally slightly faster to iterate over than lists."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['a', 'changed', 'c', 'd']\n"
     ]
    }
   ],
   "source": [
    "l[1] = 'changed'\n",
    "#t[1] = 'changed' # This would result in an error\n",
    "\n",
    "print(l)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Like many languages, Python has a for-each statement.  This statement allows you to loop over every element in a collection, such as a list or a tuple."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a\n",
      "changed\n",
      "c\n",
      "d\n"
     ]
    }
   ],
   "source": [
    "# Iterate over a collection.\n",
    "for s in l:\n",
    "    print(s)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The **enumerate** function is useful for enumerating over a collection and having access to the index of the element that we are currently on."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0:a\n",
      "1:changed\n",
      "2:c\n",
      "3:d\n"
     ]
    }
   ],
   "source": [
    "# Iterate over a collection, and know where your index.  (Python is zero-based!)\n",
    "for i,l in enumerate(l):\n",
    "    print(f\"{i}:{l}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A **list** can have multiple objects added, such as strings.  Duplicate values are allowed.  **Tuples** do not allow the program to add additional objects after definition."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['a', 'b', 'c', 'c']\n"
     ]
    }
   ],
   "source": [
    "# Manually add items, lists allow duplicates\n",
    "c = []\n",
    "c.append('a')\n",
    "c.append('b')\n",
    "c.append('c')\n",
    "c.append('c')\n",
    "print(c)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ordered collections, such as lists and tuples, allow you to access an element by its index number, as done in the following code. Unordered collections, such as dictionaries and sets, do not allow the program to access them in this way."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "b\n"
     ]
    }
   ],
   "source": [
    "print(c[1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A **list** can have multiple objects added, such as strings. Duplicate values are allowed. Tuples do not allow the program to add additional objects after definition. The programmer must specify an index for the insert function, an index. These operations are not allowed for tuples because they would result in a change."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['a0', 'a', 'b', 'c']\n",
      "['a0', 'a', 'c']\n",
      "['a', 'c']\n"
     ]
    }
   ],
   "source": [
    "# Insert\n",
    "c = ['a', 'b', 'c']\n",
    "c.insert(0, 'a0')\n",
    "print(c)\n",
    "# Remove\n",
    "c.remove('b')\n",
    "print(c)\n",
    "# Remove at index\n",
    "del c[0]\n",
    "print(c)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sets\n",
    "A Python **set** holds an unordered collection of objects, but sets do *not* allow duplicates.  If a program adds a duplicate item to a set, only one copy of each item remains in the collection.  Adding a duplicate item to a set does not result in an error.   Any of the following techniques will define a set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'c', 'a', 'b'}\n"
     ]
    }
   ],
   "source": [
    "s = set()\n",
    "s = { 'a', 'b', 'c'}\n",
    "s = set(['a', 'b', 'c'])\n",
    "print(s)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A **list** is always enclosed in square braces [], a **tuple** in parenthesis (), and similarly a **set** is enclosed in curly braces.  Programs can add items to a **set** as they run.  Programs can dynamically add items to a **set** with the **add** function.  It is important to note that the **append** function adds items to lists, whereas the **add** function adds items to a **set**.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'c', 'a', 'b'}\n"
     ]
    }
   ],
   "source": [
    "# Manually add items, sets do not allow duplicates\n",
    "# Sets add, lists append.  I find this annoying.\n",
    "c = set()\n",
    "c.add('a')\n",
    "c.add('b')\n",
    "c.add('c')\n",
    "c.add('c')\n",
    "print(c)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Maps/Dictionaries/Hash Tables\n",
    "\n",
    "Many programming languages include the concept of a map, dictionary, or hash table.  These are all very related concepts.  Python provides a dictionary that is essentially a collection of name-value pairs.  Programs define dictionaries using curly braces, as seen here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'name': 'Jeff', 'address': '123 Main'}\n",
      "Jeff\n",
      "Name is defined\n",
      "age undefined\n"
     ]
    }
   ],
   "source": [
    "d = {'name': \"Jeff\", 'address':\"123 Main\"}\n",
    "print(d)\n",
    "print(d['name'])\n",
    "\n",
    "if 'name' in d:\n",
    "    print(\"Name is defined\")\n",
    "\n",
    "if 'age' in d:\n",
    "    print(\"age defined\")\n",
    "else:\n",
    "    print(\"age undefined\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Be careful that you do not attempt to access an undefined key, as this will result in an error.  You can check to see if a key is defined, as demonstrated above.  You can also access the dictionary and provide a default value, as the following code demonstrates."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'default'"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "d.get('unknown_key', 'default')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also access the individual keys and values of a dictionary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Key: dict_keys(['name', 'address'])\n",
      "Values: dict_values(['Jeff', '123 Main'])\n"
     ]
    }
   ],
   "source": [
    "d = {'name': \"Jeff\", 'address':\"123 Main\"}\n",
    "# All of the keys\n",
    "print(f\"Key: {d.keys()}\")\n",
    "\n",
    "# All of the values\n",
    "print(f\"Values: {d.values()}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Dictionaries and lists can be combined. This syntax is closely related to [JSON](https://en.wikipedia.org/wiki/JSON).  Dictionaries and lists together are a good way to build very complex data structures.  While Python allows quotes (\") and apostrophe (') for strings, JSON only allows double-quotes (\"). We will cover JSON in much greater detail later in this module.\n",
    "\n",
    "The following code shows a hybrid usage of dictionaries and lists.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[{'name': 'Jeff & Tracy Heaton', 'pets': ['Wynton', 'Cricket', 'Hickory']}, {'name': 'John Smith', 'pets': ['rover']}, {'name': 'Jane Doe'}]\n",
      "Jeff & Tracy Heaton:['Wynton', 'Cricket', 'Hickory']\n",
      "John Smith:['rover']\n",
      "Jane Doe:no pets\n"
     ]
    }
   ],
   "source": [
    "# Python list & map structures\n",
    "customers = [\n",
    "    {\"name\": \"Jeff & Tracy Heaton\", \"pets\": [\"Wynton\", \"Cricket\", \n",
    "        \"Hickory\"]},\n",
    "    {\"name\": \"John Smith\", \"pets\": [\"rover\"]},\n",
    "    {\"name\": \"Jane Doe\"}\n",
    "]\n",
    "\n",
    "print(customers)\n",
    "\n",
    "for customer in customers:\n",
    "    print(f\"{customer['name']}:{customer.get('pets', 'no pets')}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The variable **customers** is a list that holds three dictionaries that represent customers.  You can think of these dictionaries as records in a table. The fields in these individual records are the keys of the dictionary.  Here the keys **name** and **pets** are fields. However, the field **pets** holds a list of pet names.  There is no limit to how deep you might choose to nest lists and maps.  It is also possible to nest a map inside of a map or a list inside of another list."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## More Advanced Lists\n",
    "\n",
    "Several advanced features are available for lists that this section introduces. One such function is **zip**.  Two lists can be combined into a single list by the **zip** command.  The following code demonstrates the **zip** command."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<zip object at 0x000001802A7A2E08>\n"
     ]
    }
   ],
   "source": [
    "a = [1,2,3,4,5]\n",
    "b = [5,4,3,2,1]\n",
    "\n",
    "print(zip(a,b))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To see the results of the **zip** function, we convert the returned zip object into a list. As you can see, the **zip** function returns a list of tuples.  Each tuple represents a pair of items that the function zipped together.  The order in the two lists was maintained."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)]\n"
     ]
    }
   ],
   "source": [
    "a = [1,2,3,4,5]\n",
    "b = [5,4,3,2,1]\n",
    "\n",
    "print(list(zip(a,b)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The usual method for using the **zip** command is inside of a for-loop.  The following code shows how a for-loop can assign a variable to each collection that the program is iterating. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1 - 5\n",
      "2 - 4\n",
      "3 - 3\n",
      "4 - 2\n",
      "5 - 1\n"
     ]
    }
   ],
   "source": [
    "a = [1,2,3,4,5]\n",
    "b = [5,4,3,2,1]\n",
    "\n",
    "for x,y in zip(a,b):\n",
    "    print(f'{x} - {y}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Usually, both collections will be of the same length when passed to the **zip** command.  It is not an error to have collections of different lengths.  As the following code illustrates, the **zip** command will only process elements up to the length of the smaller collection."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[(1, 5), (2, 4), (3, 3)]\n"
     ]
    }
   ],
   "source": [
    "a = [1,2,3,4,5]\n",
    "b = [5,4,3]\n",
    "\n",
    "print(list(zip(a,b)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Sometimes you may wish to know the current numeric index when a for-loop is iterating through an ordered collection.  Use the **enumerate** command to track the index location for a collection element.  Because the **enumerate** command deals with numeric indexes of the collection, the zip command will assign arbitrary indexes to elements from unordered collections.\n",
    "\n",
    "Consider how you might construct a Python program to change every element greater than 5 to the value of 5.  The following program performs this transformation.  The enumerate command allows the loop to know which element index it is currently on, thus allowing the program to be able to change the value of the current element of the collection. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[2, 5, 3, 5, 5, 3, 2, 1]\n"
     ]
    }
   ],
   "source": [
    "a = [2, 10, 3, 11, 10, 3, 2, 1]\n",
    "for i, x in enumerate(a):\n",
    "    if x>5:\n",
    "        a[i] = 5\n",
    "print(a)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The comprehension command can dynamically build up a list.  The comprehension below counts from 0 to 9 and adds each value (multiplied by 10) to a list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0, 10, 20, 30, 40, 50, 60, 70, 80, 90]\n"
     ]
    }
   ],
   "source": [
    "lst = [x*10 for x in range(10)]\n",
    "print(lst)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A dictionary can also be a comprehension.  The general format for this is:  \n",
    "\n",
    "```\n",
    "dict_variable = {key:value for (key,value) in dictonary.items()}\n",
    "```\n",
    "\n",
    "A common use for this is to build up an index to symbolic column names."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'col-zero': 0, 'col-one': 1, 'col-two': 2, 'col-three': 3}\n"
     ]
    }
   ],
   "source": [
    "text = ['col-zero','col-one', 'col-two', 'col-three']\n",
    "lookup = {key:value for (value,key) in enumerate(text)}\n",
    "print(lookup)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This can be used to easily find the index of a column by name."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The index of \"col-two\" is 2\n"
     ]
    }
   ],
   "source": [
    "print(f'The index of \"col-two\" is {lookup[\"col-two\"]}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## An Introduction to JSON\n",
    "\n",
    "Data stored in a CSV file must be flat; it must fit into rows and columns. Most people refer to this type of data as structured or tabular. This data is tabular because the number of columns is the same for every row. Individual rows may be missing a value for a column; however, these rows still have the same columns.  \n",
    "\n",
    "This data is convenient for machine learning because most models, such as neural networks, also expect incoming data to be of fixed dimensions. Real-world information is not always so tabular. Consider if the rows represent customers. These people might have multiple phone numbers and addresses. How would you describe such data using a fixed number of columns? It would be useful to have a list of these courses in each row that can be variable length for each row or student.\n",
    "\n",
    "JavaScript Object Notation (JSON) is a standard file format that stores data in a hierarchical format similar to eXtensible Markup Language (XML). JSON is nothing more than a hierarchy of lists and dictionaries. Programmers refer to this sort of data as semi-structured data or hierarchical data. The following is a sample JSON file.\n",
    "\n",
    "```\n",
    "{\n",
    "  \"firstName\": \"John\",\n",
    "  \"lastName\": \"Smith\",\n",
    "  \"isAlive\": true,\n",
    "  \"age\": 27,\n",
    "  \"address\": {\n",
    "    \"streetAddress\": \"21 2nd Street\",\n",
    "    \"city\": \"New York\",\n",
    "    \"state\": \"NY\",\n",
    "    \"postalCode\": \"10021-3100\"\n",
    "  },\n",
    "  \"phoneNumbers\": [\n",
    "    {\n",
    "      \"type\": \"home\",\n",
    "      \"number\": \"212 555-1234\"\n",
    "    },\n",
    "    {\n",
    "      \"type\": \"office\",\n",
    "      \"number\": \"646 555-4567\"\n",
    "    },\n",
    "    {\n",
    "      \"type\": \"mobile\",\n",
    "      \"number\": \"123 456-7890\"\n",
    "    }\n",
    "  ],\n",
    "  \"children\": [],\n",
    "  \"spouse\": null\n",
    "}\n",
    "```\n",
    "\n",
    "The above file may look somewhat like Python code.  You can see curly braces that define dictionaries and square brackets that define lists.  JSON does require there to be a single root element.  A list or dictionary can fulfill this role.  JSON requires double-quotes to enclose strings and names.  Single quotes are not allowed in JSON.\n",
    "\n",
    "JSON files are always legal JavaScript syntax.  JSON is also generally valid as Python code, as demonstrated by the following Python program."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "jsonHardCoded = {\n",
    "  \"firstName\": \"John\",\n",
    "  \"lastName\": \"Smith\",\n",
    "  \"isAlive\": True,\n",
    "  \"age\": 27,\n",
    "  \"address\": {\n",
    "    \"streetAddress\": \"21 2nd Street\",\n",
    "    \"city\": \"New York\",\n",
    "    \"state\": \"NY\",\n",
    "    \"postalCode\": \"10021-3100\"\n",
    "  },\n",
    "  \"phoneNumbers\": [\n",
    "    {\n",
    "      \"type\": \"home\",\n",
    "      \"number\": \"212 555-1234\"\n",
    "    },\n",
    "    {\n",
    "      \"type\": \"office\",\n",
    "      \"number\": \"646 555-4567\"\n",
    "    },\n",
    "    {\n",
    "      \"type\": \"mobile\",\n",
    "      \"number\": \"123 456-7890\"\n",
    "    }\n",
    "  ],\n",
    "  \"children\": [],\n",
    "  \"spouse\": None\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Generally, it is better to read JSON from files, strings, or the Internet than hard coding, as demonstrated here.  However, for internal data structures, sometimes such hard-coding can be useful.\n",
    "\n",
    "Python contains support for JSON.  When a Python program loads a JSON  the root list or dictionary is returned, as demonstrated by the following code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "First name: Jeff\n",
      "Last name: Heaton\n"
     ]
    }
   ],
   "source": [
    "import json\n",
    "\n",
    "json_string = '{\"first\":\"Jeff\",\"last\":\"Heaton\"}'\n",
    "obj = json.loads(json_string)\n",
    "print(f\"First name: {obj['first']}\")\n",
    "print(f\"Last name: {obj['last']}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Python programs can also load JSON from a file or URL."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'firstName': 'John', 'lastName': 'Smith', 'isAlive': True, 'age': 27, 'address': {'streetAddress': '21 2nd Street', 'city': 'New York', 'state': 'NY', 'postalCode': '10021-3100'}, 'phoneNumbers': [{'type': 'home', 'number': '212 555-1234'}, {'type': 'office', 'number': '646 555-4567'}, {'type': 'mobile', 'number': '123 456-7890'}], 'children': [], 'spouse': None}\n"
     ]
    }
   ],
   "source": [
    "import requests\n",
    "\n",
    "r = requests.get(\"https://raw.githubusercontent.com/jeffheaton/\"\n",
    "                 +\"t81_558_deep_learning/master/person.json\")\n",
    "print(r.json())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Python programs can easily generate JSON strings from Python objects of dictionaries and lists."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\"first\": \"Jeff\", \"last\": \"Heaton\"}\n"
     ]
    }
   ],
   "source": [
    "python_obj = {\"first\":\"Jeff\",\"last\":\"Heaton\"}\n",
    "print(json.dumps(python_obj))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A data scientist will generally encounter JSON when they access web services to get their data.  A data scientist might use the techniques presented in this section to convert the semi-structured JSON data into tabular data for the program to use with a model such as a neural network."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3.9 (tensorflow)",
   "language": "python",
   "name": "tensorflow"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
